Merge pull request #915 from Linardos/master

Updated GaNDLF documentation for clarity
mlcommons · Aug 16, 2024 · 0635d8b · 0635d8b
2 parents 247c881 + 4a5d06d
commit 0635d8b
Show file tree

Hide file tree

Showing 3 changed files with 17 additions and 10 deletions.
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -3,7 +3,7 @@ This document will help you get started with GaNDLF using a few representative e
 
 ## Installation
 
-Please follow the [installation instructions](./setup.md) to install GaNDLF. When the installation is complete, you should end up with the shell that looks like the following, which indicates that the GaNDLF virtual environment has been activated:
+Follow the [installation instructions](./setup.md) to install GaNDLF. When the installation is complete, you should end up with the following shell, which indicates that the GaNDLF virtual environment has been activated:
 
 ```bash
 (venv_gandlf) $> ### subsequent commands go here
@@ -23,7 +23,7 @@ A codespace will open in a web-based version of [Visual Studio Code](https://cod
 
 ## Sample Data
 
-Sample data will be used for our extensive automated unit tests in all examples. You can download the sample data from [this link](https://upenn.box.com/shared/static/y8162xkq1zz5555ye3pwadry2m2e39bs.zip). Example of how to do this from the terminal is shown below:
+Sample data will be used for our extensive automated unit tests in all examples. You can download the sample data from [this link](https://upenn.box.com/shared/static/y8162xkq1zz5555ye3pwadry2m2e39bs.zip). An example is shown below:
 
 ```bash
 # continue from previous shell

diff --git a/docs/index.md b/docs/index.md
@@ -1,9 +1,14 @@
 # GaNDLF
 
-The **G**ener**a**lly **N**uanced **D**eep **L**earning **F**ramework (GaNDLF) for segmentation and classification.
+The **G**ener**a**lly **N**uanced **D**eep **L**earning **F**ramework (GaNDLF) for reproducible segmentation and classification.
 
 ## Why use GaNDLF?
+GaNDLF was developed to lower the barrier to AI, enabling reproducibility, translation, and deployment.
+As an out-of-the-box solution, GaNDLF alleviates the need to build from scratch. Users may kickstart their project
+by modifying only **a configuration (config) file** that provides guidelines for the envisioned pipeline
+and **CSV inputs** that describe the training data.
 
+## Range of GaNDLF functionalities:
 - Supports multiple
     - Deep Learning model architectures
     - Channels/modalities 

diff --git a/docs/usage.md b/docs/usage.md
@@ -24,7 +24,7 @@ Please follow the [installation instructions](./setup.md#installation) to instal
 
 ### Anonymize Data
 
-A major reason why one would want to anonymize data is to ensure that trained models do not inadvertently do not encode protect health information [[1](https://doi.org/10.1145/3436755),[2](https://doi.org/10.1038/s42256-020-0186-1)]. GaNDLF can anonymize single images or a collection of images using the `gandlf anonymizer` command. It can be used as follows:
+A major reason why one would want to anonymize data is to ensure that trained models do not inadvertently encode protected health information [[1](https://doi.org/10.1145/3436755),[2](https://doi.org/10.1038/s42256-020-0186-1)]. GaNDLF can anonymize one or multiple images using the `gandlf anonymizer` command as follows:
 
 ```bash
 # continue from previous shell
@@ -81,7 +81,7 @@ Once these files are present, the patch miner can be run using the following com
 
 ### Running preprocessing before training/inference (optional)
 
-Running preprocessing before training/inference is optional, but recommended. It will significantly reduce the computational footprint during training/inference at the expense of larger storage requirements. To run preprocessing before training/inference you can use the following command, which will save the processed data in `./experiment_0/output_dir/` with a new data CSV and the corresponding model configuration:
+Running preprocessing before training/inference is optional, but recommended. It will significantly reduce the computational footprint during training/inference at the expense of larger storage requirements. Use the following command, which will save the processed data in `./experiment_0/output_dir/` with a new data CSV and the corresponding model configuration:
 
 ```bash
 # continue from previous shell
@@ -108,7 +108,7 @@ N,/full/path/N/0.nii.gz,/full/path/N/1.nii.gz,...,/full/path/N/X.nii.gz,/full/pa
 **Notes:**
 
 - `Channel` can be substituted with `Modality` or `Image`
-- `Label` can be substituted with `Mask` or `Segmentation`and is used to specify the annotation file for segmentation models
+- `Label` can be substituted with `Mask` or `Segmentation` and is used to specify the annotation file for segmentation models
 - For classification/regression, add a column called `ValueToPredict`. Currently, we are supporting only a single value prediction per model.
 - Only a single `Label` or `ValueToPredict` header should be passed 
     - Multiple segmentation classes should be in a single file with unique label numbers.
@@ -152,14 +152,14 @@ The following command shows how the script works:
 (venv_gandlf) $> gandlf construct-csv \
   # -h, --help         Show help message and exit
   -i $DATA_DIRECTORY # this is the main data directory 
-  -c _t1.nii.gz,_t1ce.nii.gz,_t2.nii.gz,_flair.nii.gz \ # an example image identifier for 4 structural brain MR sequences for BraTS, and can be changed based on your data
+  -c _t1.nii.gz,_t1ce.nii.gz,_t2.nii.gz,_flair.nii.gz \ # an example image identifier for 4 structural brain MR sequences for BraTS, and can be changed based on your data. In the simplest case of a single modality, a ".nii.gz" will suffice
   -l _seg.nii.gz \ # an example label identifier - not needed for regression/classification, and can be changed based on your data
   -o ./experiment_0/train_data.csv # output CSV to be used for training
 ```
 
 **Notes**:
 
-- For classification/regression, add a column called `ValueToPredict`. Currently, we are supporting only a single value prediction per model.
+- For classification/regression, add a column called `ValueToPredict`. Currently, we support only a single value prediction per model.
 - `SubjectID` or `PatientName` is used to ensure that the randomized split is done per-subject rather than per-image.
 - For data arrangement different to what is described above, a customized script will need to be written to generate the CSV, or you can enter the data manually into the CSV. 
 
@@ -179,13 +179,15 @@ To split the data CSV into training, validation, and testing CSVs, the `gandlf s
 
 ## Customize the Training
 
-GaNDLF requires a YAML-based configuration that controls various aspects of the training/inference process. There are multiple samples for users to start as their baseline for further customization. A list of the available samples is presented as follows:
+Adapting GaNDLF to your needs boils down to modifying a YAML-based configuration file which controls the parameters of training and inference. Below is a list of available samples for users to start as their baseline for further customization:
 
-- [Sample showing all the available options](https://github.com/mlcommons/GaNDLF/blob/master/samples/config_all_options.yaml)
 - [Segmentation example](https://github.com/mlcommons/GaNDLF/blob/master/samples/config_segmentation_brats.yaml)
 - [Regression example](https://github.com/mlcommons/GaNDLF/blob/master/samples/config_regression.yaml)
 - [Classification example](https://github.com/mlcommons/GaNDLF/blob/master/samples/config_classification.yaml)
 
+To find **all the parameters** a GaNDLF config may modify, consult the following file: 
+- [All available options](https://github.com/mlcommons/GaNDLF/blob/master/samples/config_all_options.yaml)
+
 **Notes**: 
 
 - More details on the configuration options are available in the [customization page](customize.md).