Feature/SK-855 | Monai Example (#620)

scaleoutsystems · Jun 10, 2024 · 25b47cb · 25b47cb
1 parent 9fd4442
commit 25b47cb
Show file tree

Hide file tree

Showing 12 changed files with 698 additions and 0 deletions.
diff --git a/examples/monai-2D-mednist/.dockerignore b/examples/monai-2D-mednist/.dockerignore
@@ -0,0 +1,4 @@
+data
+seed.npz
+*.tgz
+*.tar.gz
diff --git a/examples/monai-2D-mednist/.gitignore b/examples/monai-2D-mednist/.gitignore
@@ -0,0 +1,6 @@
+data
+*.npz
+*.tgz
+*.tar.gz
+.mnist-pytorch
+client.yaml
diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst
@@ -0,0 +1,169 @@
+FEDn Project: MonAI 2D Classification with the MedNIST Dataset (PyTorch)
+------------------------------------------------------------------------
+
+This is an example FEDn Project based on the  MonAI 2D Classification with the MedNIST Dataset. 
+The example is intented as a minimalistic quickstart and automates the handling of training data
+by letting the client download and create its partition of the dataset as it starts up. 
+
+Links: 
+
+-  MonAI: https://monai.io/
+-  Base example notebook: https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb
+-  MedNIST dataset: https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz 
+
+Prerequisites
+-------------
+
+Using FEDn Studio:
+
+-  `Python 3.8, 3.9, 3.10 or 3.11 <https://www.python.org/downloads>`__
+-  `A FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__   
+
+If using pseudo-distributed mode with docker-compose:
+
+-  `Docker <https://docs.docker.com/get-docker>`__
+-  `Docker Compose <https://docs.docker.com/compose/install>`__
+
+Creating the compute package and seed model
+-------------------------------------------
+
+Install fedn: 
+
+.. code-block::
+
+   pip install fedn
+
+Clone this repository, then locate into this directory:
+
+.. code-block::
+
+   git clone https://github.com/scaleoutsystems/fedn.git
+   cd fedn/examples/monai-2D-mednist
+
+Create the compute package:
+
+.. code-block::
+
+   fedn package create --path client
+
+This should create a file 'package.tgz' in the project folder.
+
+Next, generate a seed model (the first model in a global model trail):
+
+.. code-block::
+
+   fedn run build --path client
+
+This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv).  
+
+Using FEDn Studio
+-----------------
+
+Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide <https://fedn.readthedocs.io/en/stable/studio.html>`__.
+On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above. 
+
+Connecting clients:
+===================
+
+**NOTE: In case a different data path needs to be set, use the env variable FEDN_DATA_PATH.**
+
+.. code-block::
+
+   export FEDN_PACKAGE_EXTRACT_DIR=package
+   export FEDN_DATA_PATH=./data/
+   export FEDN_CLIENT_SETTINGS_PATH=<full_path_to_the_dir>/client_settings.yaml
+   fedn client start -in client.yaml --secure=True --force-ssl
+
+Connecting clients using Docker:
+================================
+
+For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalled. To start a client using Docker: 
+
+.. code-block::
+
+   docker run \
+     -v $PWD/client.yaml:/app/client.yaml \
+     -v $PWD/client_settings.yaml:/app/client_settings.yaml \
+     -e FEDN_PACKAGE_EXTRACT_DIR=package \
+     -e FEDN_DATA_PATH=./data/ \
+     -e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \
+     ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True
+
+
+**NOTE: The following instructions are only for SDK-based client communication and for local development environments using Docker.**
+
+
+Local development mode using Docker/docker compose
+--------------------------------------------------
+
+Follow the steps above to install FEDn, generate 'package.tgz' and 'seed.tgz'.
+
+Start a pseudo-distributed FEDn network using docker-compose:
+
+.. code-block::
+
+   docker compose \
+    -f ../../docker-compose.yaml \
+    -f docker-compose.override.yaml \
+    up
+
+This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients. 
+You can verify the deployment using these urls: 
+
+- API Server: http://localhost:8092/get_controller_status
+- Minio: http://localhost:9000
+- Mongo Express: http://localhost:8081
+
+Upload the package and seed model to FEDn controller using the APIClient. In Python:
+
+.. code-block::
+
+   from fedn import APIClient
+   client = APIClient(host="localhost", port=8092)
+   client.set_active_package("package.tgz", helper="numpyhelper")
+   client.set_active_model("seed.npz")
+
+You can now start a training session with 5 rounds (default): 
+
+.. code-block::
+
+   client.start_session()
+
+Automate experimentation with several clients  
+=============================================
+
+If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, 
+in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client 
+by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data3/``
+
+
+Access message logs and validation data from MongoDB  
+====================================================
+
+You can access and download event logs and validation data via the API, and you can also as a developer obtain 
+the MongoDB backend data using pymongo or via the MongoExpress interface: 
+
+- http://localhost:8081/db/fedn-network/ 
+
+The credentials are as set in docker-compose.yaml in the root of the repository. 
+
+Access global models   
+====================
+
+You can obtain global model updates from the 'fedn-models' bucket in Minio: 
+
+- http://localhost:9000
+
+Reset the FEDn deployment   
+=========================
+
+To purge all data from a deployment incuding all session and round data, access the MongoExpress UI interface and 
+delete the entire ``fedn-network`` collection. Then restart all services. 
+
+Clean up
+========
+You can clean up by running 
+
+.. code-block::
+
+   docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml down -v
diff --git a/examples/monai-2D-mednist/client/data.py b/examples/monai-2D-mednist/client/data.py
@@ -0,0 +1,153 @@
+import os
+import random
+
+import numpy as np
+import PIL
+import torch
+import yaml
+from monai.apps import download_and_extract
+
+dir_path = os.path.dirname(os.path.realpath(__file__))
+abs_path = os.path.abspath(dir_path)
+
+DATA_CLASSES = {"AbdomenCT": 0, "BreastMRI": 1, "CXR": 2, "ChestCT": 3, "Hand": 4, "HeadCT": 5}
+
+
+def split_data(data_path="data/MedNIST", splits=100, validation_split=0.9):
+    # create clients
+    clients = {"client " + str(i): {"train": [], "validation": []} for i in range(splits)}
+
+    for class_ in os.listdir(data_path):
+        if os.path.isdir(os.path.join(data_path, class_)):
+            patients_in_class = [os.path.join(class_, patient) for patient in os.listdir(os.path.join(data_path, class_))]
+            np.random.shuffle(patients_in_class)
+            chops = np.int32(np.linspace(0, len(patients_in_class), splits + 1))
+            for split in range(splits):
+                p = patients_in_class[chops[split] : chops[split + 1]]
+                valsplit = np.int32(len(p) * validation_split)
+
+                clients["client " + str(split)]["train"] += p[:valsplit]
+                clients["client " + str(split)]["validation"] += p[valsplit:]
+
+    with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "w") as file:
+        yaml.dump(clients, file, default_flow_style=False)
+
+
+def get_data(out_dir="data"):
+    """Get data from the external repository.
+
+    :param out_dir: Path to data directory. If doesn't
+    :type data_dir: str
+    """
+    # Make dir if necessary
+    if not os.path.exists(out_dir):
+        os.mkdir(out_dir)
+
+    resource = "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz"
+    md5 = "0bc7306e7427e00ad1c5526a6677552d"
+
+    compressed_file = os.path.join(out_dir, "MedNIST.tar.gz")
+
+    data_dir = os.path.abspath(out_dir)
+    print("data_dir:", data_dir)
+    if os.path.exists(data_dir):
+        print("path exist.")
+        if not os.path.exists(compressed_file):
+            print("compressed file does not exist, downloading and extracting data.")
+            download_and_extract(resource, compressed_file, data_dir, md5)
+        else:
+            print("files already exist.")
+
+    split_data()
+
+
+def get_classes(data_path):
+    """Get a list of classes from the dataset
+
+    :param data_path: Path to data directory.
+    :type data_path: str
+    """
+    if data_path is None:
+        data_path = os.environ.get("FEDN_DATA_PATH", abs_path + "/data/MedNIST")
+
+    class_names = sorted(x for x in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, x)))
+    return class_names
+
+
+def load_data(data_path, sample_size=None, is_train=True):
+    """Load data from disk.
+
+    :param data_path: Path to data directory.
+    :type data_path: str
+    :param is_train: Whether to load training or test data.
+    :type is_train: bool
+    :return: Tuple of data and labels.
+    :rtype: tuple
+    """
+    if data_path is None:
+        data_path = os.environ.get("FEDN_DATA_PATH", abs_path + "/data/MedNIST")
+
+    class_names = get_classes(data_path)
+    num_class = len(class_names)
+
+    image_files_all = [[os.path.join(data_path, class_names[i], x) for x in os.listdir(os.path.join(data_path, class_names[i]))] for i in range(num_class)]
+
+    # To make the dataset small, we are using sample_size=100 images of each class.
+    if sample_size is None:
+        image_files = image_files_all
+
+    else:
+        image_files = [random.sample(inner_list, sample_size) for inner_list in image_files_all]
+
+    num_each = [len(image_files[i]) for i in range(num_class)]
+    image_files_list = []
+    image_class = []
+    for i in range(num_class):
+        image_files_list.extend(image_files[i])
+        image_class.extend([i] * num_each[i])
+    num_total = len(image_class)
+    image_width, image_height = PIL.Image.open(image_files_list[0]).size
+
+    print(f"Total image count: {num_total}")
+    print(f"Image dimensions: {image_width} x {image_height}")
+    print(f"Label names: {class_names}")
+    print(f"Label counts: {num_each}")
+
+    val_frac = 0.1
+    length = len(image_files_list)
+    indices = np.arange(length)
+    np.random.shuffle(indices)
+
+    val_split = int(val_frac * length)
+    val_indices = indices[:val_split]
+    train_indices = indices[val_split:]
+
+    train_x = [image_files_list[i] for i in train_indices]
+    train_y = [image_class[i] for i in train_indices]
+    val_x = [image_files_list[i] for i in val_indices]
+    val_y = [image_class[i] for i in val_indices]
+
+    print(f"Training count: {len(train_x)}, Validation count: " f"{len(val_x)}")
+
+    if is_train:
+        return train_x, train_y
+    else:
+        return val_x, val_y, class_names
+
+
+class MedNISTDataset(torch.utils.data.Dataset):
+    def __init__(self, data_path, image_files, transforms):
+        self.data_path = data_path
+        self.image_files = image_files
+        self.transforms = transforms
+
+    def __len__(self):
+        return len(self.image_files)
+
+    def __getitem__(self, index):
+        return (self.transforms(os.path.join(self.data_path, self.image_files[index])), DATA_CLASSES[os.path.dirname(self.image_files[index])])
+
+
+if __name__ == "__main__":
+    # Prepare data if not already done
+    get_data()
diff --git a/examples/monai-2D-mednist/client/fedn.yaml b/examples/monai-2D-mednist/client/fedn.yaml
@@ -0,0 +1,10 @@
+python_env: python_env.yaml
+entry_points:
+  build:
+    command: python model.py
+  startup:
+    command: python data.py
+  train:
+    command: python train.py
+  validate:
+    command: python validate.py