Skip to content

Commit

Permalink
Feature/SK-855 | Monai Example (#620)
Browse files Browse the repository at this point in the history
  • Loading branch information
sztoor authored Jun 10, 2024
1 parent 9fd4442 commit 25b47cb
Show file tree
Hide file tree
Showing 12 changed files with 698 additions and 0 deletions.
4 changes: 4 additions & 0 deletions examples/monai-2D-mednist/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
data
seed.npz
*.tgz
*.tar.gz
6 changes: 6 additions & 0 deletions examples/monai-2D-mednist/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
data
*.npz
*.tgz
*.tar.gz
.mnist-pytorch
client.yaml
169 changes: 169 additions & 0 deletions examples/monai-2D-mednist/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
FEDn Project: MonAI 2D Classification with the MedNIST Dataset (PyTorch)
------------------------------------------------------------------------

This is an example FEDn Project based on the MonAI 2D Classification with the MedNIST Dataset.
The example is intented as a minimalistic quickstart and automates the handling of training data
by letting the client download and create its partition of the dataset as it starts up.

Links:

- MonAI: https://monai.io/
- Base example notebook: https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb
- MedNIST dataset: https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz

Prerequisites
-------------

Using FEDn Studio:

- `Python 3.8, 3.9, 3.10 or 3.11 <https://www.python.org/downloads>`__
- `A FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__

If using pseudo-distributed mode with docker-compose:

- `Docker <https://docs.docker.com/get-docker>`__
- `Docker Compose <https://docs.docker.com/compose/install>`__

Creating the compute package and seed model
-------------------------------------------

Install fedn:

.. code-block::
pip install fedn
Clone this repository, then locate into this directory:

.. code-block::
git clone https://github.com/scaleoutsystems/fedn.git
cd fedn/examples/monai-2D-mednist
Create the compute package:

.. code-block::
fedn package create --path client
This should create a file 'package.tgz' in the project folder.

Next, generate a seed model (the first model in a global model trail):

.. code-block::
fedn run build --path client
This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv).

Using FEDn Studio
-----------------

Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide <https://fedn.readthedocs.io/en/stable/studio.html>`__.
On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above.

Connecting clients:
===================

**NOTE: In case a different data path needs to be set, use the env variable FEDN_DATA_PATH.**

.. code-block::
export FEDN_PACKAGE_EXTRACT_DIR=package
export FEDN_DATA_PATH=./data/
export FEDN_CLIENT_SETTINGS_PATH=<full_path_to_the_dir>/client_settings.yaml
fedn client start -in client.yaml --secure=True --force-ssl
Connecting clients using Docker:
================================

For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalled. To start a client using Docker:

.. code-block::
docker run \
-v $PWD/client.yaml:/app/client.yaml \
-v $PWD/client_settings.yaml:/app/client_settings.yaml \
-e FEDN_PACKAGE_EXTRACT_DIR=package \
-e FEDN_DATA_PATH=./data/ \
-e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \
ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True
**NOTE: The following instructions are only for SDK-based client communication and for local development environments using Docker.**


Local development mode using Docker/docker compose
--------------------------------------------------

Follow the steps above to install FEDn, generate 'package.tgz' and 'seed.tgz'.

Start a pseudo-distributed FEDn network using docker-compose:

.. code-block::
docker compose \
-f ../../docker-compose.yaml \
-f docker-compose.override.yaml \
up
This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients.
You can verify the deployment using these urls:

- API Server: http://localhost:8092/get_controller_status
- Minio: http://localhost:9000
- Mongo Express: http://localhost:8081

Upload the package and seed model to FEDn controller using the APIClient. In Python:

.. code-block::
from fedn import APIClient
client = APIClient(host="localhost", port=8092)
client.set_active_package("package.tgz", helper="numpyhelper")
client.set_active_model("seed.npz")
You can now start a training session with 5 rounds (default):

.. code-block::
client.start_session()
Automate experimentation with several clients
=============================================

If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example,
in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client
by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data3/``


Access message logs and validation data from MongoDB
====================================================

You can access and download event logs and validation data via the API, and you can also as a developer obtain
the MongoDB backend data using pymongo or via the MongoExpress interface:

- http://localhost:8081/db/fedn-network/

The credentials are as set in docker-compose.yaml in the root of the repository.

Access global models
====================

You can obtain global model updates from the 'fedn-models' bucket in Minio:

- http://localhost:9000

Reset the FEDn deployment
=========================

To purge all data from a deployment incuding all session and round data, access the MongoExpress UI interface and
delete the entire ``fedn-network`` collection. Then restart all services.

Clean up
========
You can clean up by running

.. code-block::
docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml down -v
153 changes: 153 additions & 0 deletions examples/monai-2D-mednist/client/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
import os
import random

import numpy as np
import PIL
import torch
import yaml
from monai.apps import download_and_extract

dir_path = os.path.dirname(os.path.realpath(__file__))
abs_path = os.path.abspath(dir_path)

DATA_CLASSES = {"AbdomenCT": 0, "BreastMRI": 1, "CXR": 2, "ChestCT": 3, "Hand": 4, "HeadCT": 5}


def split_data(data_path="data/MedNIST", splits=100, validation_split=0.9):
# create clients
clients = {"client " + str(i): {"train": [], "validation": []} for i in range(splits)}

for class_ in os.listdir(data_path):
if os.path.isdir(os.path.join(data_path, class_)):
patients_in_class = [os.path.join(class_, patient) for patient in os.listdir(os.path.join(data_path, class_))]
np.random.shuffle(patients_in_class)
chops = np.int32(np.linspace(0, len(patients_in_class), splits + 1))
for split in range(splits):
p = patients_in_class[chops[split] : chops[split + 1]]
valsplit = np.int32(len(p) * validation_split)

clients["client " + str(split)]["train"] += p[:valsplit]
clients["client " + str(split)]["validation"] += p[valsplit:]

with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "w") as file:
yaml.dump(clients, file, default_flow_style=False)


def get_data(out_dir="data"):
"""Get data from the external repository.
:param out_dir: Path to data directory. If doesn't
:type data_dir: str
"""
# Make dir if necessary
if not os.path.exists(out_dir):
os.mkdir(out_dir)

resource = "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz"
md5 = "0bc7306e7427e00ad1c5526a6677552d"

compressed_file = os.path.join(out_dir, "MedNIST.tar.gz")

data_dir = os.path.abspath(out_dir)
print("data_dir:", data_dir)
if os.path.exists(data_dir):
print("path exist.")
if not os.path.exists(compressed_file):
print("compressed file does not exist, downloading and extracting data.")
download_and_extract(resource, compressed_file, data_dir, md5)
else:
print("files already exist.")

split_data()


def get_classes(data_path):
"""Get a list of classes from the dataset
:param data_path: Path to data directory.
:type data_path: str
"""
if data_path is None:
data_path = os.environ.get("FEDN_DATA_PATH", abs_path + "/data/MedNIST")

class_names = sorted(x for x in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, x)))
return class_names


def load_data(data_path, sample_size=None, is_train=True):
"""Load data from disk.
:param data_path: Path to data directory.
:type data_path: str
:param is_train: Whether to load training or test data.
:type is_train: bool
:return: Tuple of data and labels.
:rtype: tuple
"""
if data_path is None:
data_path = os.environ.get("FEDN_DATA_PATH", abs_path + "/data/MedNIST")

class_names = get_classes(data_path)
num_class = len(class_names)

image_files_all = [[os.path.join(data_path, class_names[i], x) for x in os.listdir(os.path.join(data_path, class_names[i]))] for i in range(num_class)]

# To make the dataset small, we are using sample_size=100 images of each class.
if sample_size is None:
image_files = image_files_all

else:
image_files = [random.sample(inner_list, sample_size) for inner_list in image_files_all]

num_each = [len(image_files[i]) for i in range(num_class)]
image_files_list = []
image_class = []
for i in range(num_class):
image_files_list.extend(image_files[i])
image_class.extend([i] * num_each[i])
num_total = len(image_class)
image_width, image_height = PIL.Image.open(image_files_list[0]).size

print(f"Total image count: {num_total}")
print(f"Image dimensions: {image_width} x {image_height}")
print(f"Label names: {class_names}")
print(f"Label counts: {num_each}")

val_frac = 0.1
length = len(image_files_list)
indices = np.arange(length)
np.random.shuffle(indices)

val_split = int(val_frac * length)
val_indices = indices[:val_split]
train_indices = indices[val_split:]

train_x = [image_files_list[i] for i in train_indices]
train_y = [image_class[i] for i in train_indices]
val_x = [image_files_list[i] for i in val_indices]
val_y = [image_class[i] for i in val_indices]

print(f"Training count: {len(train_x)}, Validation count: " f"{len(val_x)}")

if is_train:
return train_x, train_y
else:
return val_x, val_y, class_names


class MedNISTDataset(torch.utils.data.Dataset):
def __init__(self, data_path, image_files, transforms):
self.data_path = data_path
self.image_files = image_files
self.transforms = transforms

def __len__(self):
return len(self.image_files)

def __getitem__(self, index):
return (self.transforms(os.path.join(self.data_path, self.image_files[index])), DATA_CLASSES[os.path.dirname(self.image_files[index])])


if __name__ == "__main__":
# Prepare data if not already done
get_data()
10 changes: 10 additions & 0 deletions examples/monai-2D-mednist/client/fedn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
python_env: python_env.yaml
entry_points:
build:
command: python model.py
startup:
command: python data.py
train:
command: python train.py
validate:
command: python validate.py
Loading

0 comments on commit 25b47cb

Please sign in to comment.