-
Notifications
You must be signed in to change notification settings - Fork 36
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature/SK-855 | Monai Example (#620)
- Loading branch information
Showing
12 changed files
with
698 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
data | ||
seed.npz | ||
*.tgz | ||
*.tar.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
data | ||
*.npz | ||
*.tgz | ||
*.tar.gz | ||
.mnist-pytorch | ||
client.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
FEDn Project: MonAI 2D Classification with the MedNIST Dataset (PyTorch) | ||
------------------------------------------------------------------------ | ||
|
||
This is an example FEDn Project based on the MonAI 2D Classification with the MedNIST Dataset. | ||
The example is intented as a minimalistic quickstart and automates the handling of training data | ||
by letting the client download and create its partition of the dataset as it starts up. | ||
|
||
Links: | ||
|
||
- MonAI: https://monai.io/ | ||
- Base example notebook: https://github.com/Project-MONAI/tutorials/blob/main/2d_classification/mednist_tutorial.ipynb | ||
- MedNIST dataset: https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz | ||
|
||
Prerequisites | ||
------------- | ||
|
||
Using FEDn Studio: | ||
|
||
- `Python 3.8, 3.9, 3.10 or 3.11 <https://www.python.org/downloads>`__ | ||
- `A FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ | ||
|
||
If using pseudo-distributed mode with docker-compose: | ||
|
||
- `Docker <https://docs.docker.com/get-docker>`__ | ||
- `Docker Compose <https://docs.docker.com/compose/install>`__ | ||
|
||
Creating the compute package and seed model | ||
------------------------------------------- | ||
|
||
Install fedn: | ||
|
||
.. code-block:: | ||
pip install fedn | ||
Clone this repository, then locate into this directory: | ||
|
||
.. code-block:: | ||
git clone https://github.com/scaleoutsystems/fedn.git | ||
cd fedn/examples/monai-2D-mednist | ||
Create the compute package: | ||
|
||
.. code-block:: | ||
fedn package create --path client | ||
This should create a file 'package.tgz' in the project folder. | ||
|
||
Next, generate a seed model (the first model in a global model trail): | ||
|
||
.. code-block:: | ||
fedn run build --path client | ||
This will create a seed model called 'seed.npz' in the root of the project. This step will take a few minutes, depending on hardware and internet connection (builds a virtualenv). | ||
|
||
Using FEDn Studio | ||
----------------- | ||
|
||
Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide <https://fedn.readthedocs.io/en/stable/studio.html>`__. | ||
On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above. | ||
|
||
Connecting clients: | ||
=================== | ||
|
||
**NOTE: In case a different data path needs to be set, use the env variable FEDN_DATA_PATH.** | ||
|
||
.. code-block:: | ||
export FEDN_PACKAGE_EXTRACT_DIR=package | ||
export FEDN_DATA_PATH=./data/ | ||
export FEDN_CLIENT_SETTINGS_PATH=<full_path_to_the_dir>/client_settings.yaml | ||
fedn client start -in client.yaml --secure=True --force-ssl | ||
Connecting clients using Docker: | ||
================================ | ||
|
||
For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalled. To start a client using Docker: | ||
|
||
.. code-block:: | ||
docker run \ | ||
-v $PWD/client.yaml:/app/client.yaml \ | ||
-v $PWD/client_settings.yaml:/app/client_settings.yaml \ | ||
-e FEDN_PACKAGE_EXTRACT_DIR=package \ | ||
-e FEDN_DATA_PATH=./data/ \ | ||
-e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \ | ||
ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True | ||
**NOTE: The following instructions are only for SDK-based client communication and for local development environments using Docker.** | ||
|
||
|
||
Local development mode using Docker/docker compose | ||
-------------------------------------------------- | ||
|
||
Follow the steps above to install FEDn, generate 'package.tgz' and 'seed.tgz'. | ||
|
||
Start a pseudo-distributed FEDn network using docker-compose: | ||
|
||
.. code-block:: | ||
docker compose \ | ||
-f ../../docker-compose.yaml \ | ||
-f docker-compose.override.yaml \ | ||
up | ||
This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients. | ||
You can verify the deployment using these urls: | ||
|
||
- API Server: http://localhost:8092/get_controller_status | ||
- Minio: http://localhost:9000 | ||
- Mongo Express: http://localhost:8081 | ||
|
||
Upload the package and seed model to FEDn controller using the APIClient. In Python: | ||
|
||
.. code-block:: | ||
from fedn import APIClient | ||
client = APIClient(host="localhost", port=8092) | ||
client.set_active_package("package.tgz", helper="numpyhelper") | ||
client.set_active_model("seed.npz") | ||
You can now start a training session with 5 rounds (default): | ||
|
||
.. code-block:: | ||
client.start_session() | ||
Automate experimentation with several clients | ||
============================================= | ||
|
||
If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, | ||
in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client | ||
by copying ``client1`` and setting ``FEDN_DATA_PATH`` to ``/app/package/data3/`` | ||
|
||
|
||
Access message logs and validation data from MongoDB | ||
==================================================== | ||
|
||
You can access and download event logs and validation data via the API, and you can also as a developer obtain | ||
the MongoDB backend data using pymongo or via the MongoExpress interface: | ||
|
||
- http://localhost:8081/db/fedn-network/ | ||
|
||
The credentials are as set in docker-compose.yaml in the root of the repository. | ||
|
||
Access global models | ||
==================== | ||
|
||
You can obtain global model updates from the 'fedn-models' bucket in Minio: | ||
|
||
- http://localhost:9000 | ||
|
||
Reset the FEDn deployment | ||
========================= | ||
|
||
To purge all data from a deployment incuding all session and round data, access the MongoExpress UI interface and | ||
delete the entire ``fedn-network`` collection. Then restart all services. | ||
|
||
Clean up | ||
======== | ||
You can clean up by running | ||
|
||
.. code-block:: | ||
docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml down -v |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
import os | ||
import random | ||
|
||
import numpy as np | ||
import PIL | ||
import torch | ||
import yaml | ||
from monai.apps import download_and_extract | ||
|
||
dir_path = os.path.dirname(os.path.realpath(__file__)) | ||
abs_path = os.path.abspath(dir_path) | ||
|
||
DATA_CLASSES = {"AbdomenCT": 0, "BreastMRI": 1, "CXR": 2, "ChestCT": 3, "Hand": 4, "HeadCT": 5} | ||
|
||
|
||
def split_data(data_path="data/MedNIST", splits=100, validation_split=0.9): | ||
# create clients | ||
clients = {"client " + str(i): {"train": [], "validation": []} for i in range(splits)} | ||
|
||
for class_ in os.listdir(data_path): | ||
if os.path.isdir(os.path.join(data_path, class_)): | ||
patients_in_class = [os.path.join(class_, patient) for patient in os.listdir(os.path.join(data_path, class_))] | ||
np.random.shuffle(patients_in_class) | ||
chops = np.int32(np.linspace(0, len(patients_in_class), splits + 1)) | ||
for split in range(splits): | ||
p = patients_in_class[chops[split] : chops[split + 1]] | ||
valsplit = np.int32(len(p) * validation_split) | ||
|
||
clients["client " + str(split)]["train"] += p[:valsplit] | ||
clients["client " + str(split)]["validation"] += p[valsplit:] | ||
|
||
with open(os.path.join(os.path.dirname(data_path), "data_splits.yaml"), "w") as file: | ||
yaml.dump(clients, file, default_flow_style=False) | ||
|
||
|
||
def get_data(out_dir="data"): | ||
"""Get data from the external repository. | ||
:param out_dir: Path to data directory. If doesn't | ||
:type data_dir: str | ||
""" | ||
# Make dir if necessary | ||
if not os.path.exists(out_dir): | ||
os.mkdir(out_dir) | ||
|
||
resource = "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz" | ||
md5 = "0bc7306e7427e00ad1c5526a6677552d" | ||
|
||
compressed_file = os.path.join(out_dir, "MedNIST.tar.gz") | ||
|
||
data_dir = os.path.abspath(out_dir) | ||
print("data_dir:", data_dir) | ||
if os.path.exists(data_dir): | ||
print("path exist.") | ||
if not os.path.exists(compressed_file): | ||
print("compressed file does not exist, downloading and extracting data.") | ||
download_and_extract(resource, compressed_file, data_dir, md5) | ||
else: | ||
print("files already exist.") | ||
|
||
split_data() | ||
|
||
|
||
def get_classes(data_path): | ||
"""Get a list of classes from the dataset | ||
:param data_path: Path to data directory. | ||
:type data_path: str | ||
""" | ||
if data_path is None: | ||
data_path = os.environ.get("FEDN_DATA_PATH", abs_path + "/data/MedNIST") | ||
|
||
class_names = sorted(x for x in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, x))) | ||
return class_names | ||
|
||
|
||
def load_data(data_path, sample_size=None, is_train=True): | ||
"""Load data from disk. | ||
:param data_path: Path to data directory. | ||
:type data_path: str | ||
:param is_train: Whether to load training or test data. | ||
:type is_train: bool | ||
:return: Tuple of data and labels. | ||
:rtype: tuple | ||
""" | ||
if data_path is None: | ||
data_path = os.environ.get("FEDN_DATA_PATH", abs_path + "/data/MedNIST") | ||
|
||
class_names = get_classes(data_path) | ||
num_class = len(class_names) | ||
|
||
image_files_all = [[os.path.join(data_path, class_names[i], x) for x in os.listdir(os.path.join(data_path, class_names[i]))] for i in range(num_class)] | ||
|
||
# To make the dataset small, we are using sample_size=100 images of each class. | ||
if sample_size is None: | ||
image_files = image_files_all | ||
|
||
else: | ||
image_files = [random.sample(inner_list, sample_size) for inner_list in image_files_all] | ||
|
||
num_each = [len(image_files[i]) for i in range(num_class)] | ||
image_files_list = [] | ||
image_class = [] | ||
for i in range(num_class): | ||
image_files_list.extend(image_files[i]) | ||
image_class.extend([i] * num_each[i]) | ||
num_total = len(image_class) | ||
image_width, image_height = PIL.Image.open(image_files_list[0]).size | ||
|
||
print(f"Total image count: {num_total}") | ||
print(f"Image dimensions: {image_width} x {image_height}") | ||
print(f"Label names: {class_names}") | ||
print(f"Label counts: {num_each}") | ||
|
||
val_frac = 0.1 | ||
length = len(image_files_list) | ||
indices = np.arange(length) | ||
np.random.shuffle(indices) | ||
|
||
val_split = int(val_frac * length) | ||
val_indices = indices[:val_split] | ||
train_indices = indices[val_split:] | ||
|
||
train_x = [image_files_list[i] for i in train_indices] | ||
train_y = [image_class[i] for i in train_indices] | ||
val_x = [image_files_list[i] for i in val_indices] | ||
val_y = [image_class[i] for i in val_indices] | ||
|
||
print(f"Training count: {len(train_x)}, Validation count: " f"{len(val_x)}") | ||
|
||
if is_train: | ||
return train_x, train_y | ||
else: | ||
return val_x, val_y, class_names | ||
|
||
|
||
class MedNISTDataset(torch.utils.data.Dataset): | ||
def __init__(self, data_path, image_files, transforms): | ||
self.data_path = data_path | ||
self.image_files = image_files | ||
self.transforms = transforms | ||
|
||
def __len__(self): | ||
return len(self.image_files) | ||
|
||
def __getitem__(self, index): | ||
return (self.transforms(os.path.join(self.data_path, self.image_files[index])), DATA_CLASSES[os.path.dirname(self.image_files[index])]) | ||
|
||
|
||
if __name__ == "__main__": | ||
# Prepare data if not already done | ||
get_data() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
python_env: python_env.yaml | ||
entry_points: | ||
build: | ||
command: python model.py | ||
startup: | ||
command: python data.py | ||
train: | ||
command: python train.py | ||
validate: | ||
command: python validate.py |
Oops, something went wrong.