📜 This is the official code repository for DermSynth3D.
📢 DermSynth3D is now accepted to MedIA 🎉.
🤗NEW Try out the DermSynth3D web demo here.
📺 Check out the video abstract for this work:
A data generation pipeline for creating photorealistic in-the-wild synthetic dermatological data with rich annotations such as semantic segmentation masks, depth maps, and bounding boxes for various skin analysis tasks.
The figure shows the DermSynth3D computational pipeline where 2D segmented skin conditions are blended into the texture image of a 3D mesh on locations outside of the hair and clothing regions. After blending, 2D views of the mesh are rendered with a variety of camera viewpoints and lighting conditions and combined with background images to create a synthetic dermatology dataset.
In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis.
However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions.
To address these shortcomings, we propose a novel framework called
DermSynth3D/
┣ assets/ # assets for the README
┣ configs/ # YAML config files to run the pipeline
┣ logs/ # experiment logs are saved here (auto created)
┣ out/ # the checkpoints are saved here (auto created)
┣ data/ # directory to store the data
┃ ┣ ... # detailed instructions in the dataset.md
┣ dermsynth3d/ #
┃ ┣ datasets/ # class definitions for the datasets
┃ ┣ deepblend/ # code for deep blending
┃ ┣ losses/ # loss functions
┃ ┣ models/ # model definitions
┃ ┣ tools/ # wrappers for synthetic data generation
┃ ┗ utils/ # helper functions
┣ notebooks/ # demo notebooks for the pipeline
┣ scripts/ # scripts for traning and evaluation
┗ skin3d/ # external module
- DermSynth3D
- TL;DR
- Motivation
- Repository layout
- Table of Contents
- Datasets
- The folder structure of data directory should be as follows:
- Data for Blending
- Download 3DBodyTex.v1 meshes
- Download the 3DBodyTex.v1 annotations
- Download the Fitzpatrick17k dataset
- Download the Background Scenes
- Data For Training
- Download the FUSeg dataset
- Download the Pratheepan dataset
- Download the PH2 dataset
- Download the DermoFit dataset
- Creating the Synthetic dataset
- How to Use DermSynth3D
- Preparing Dataset for Experiments
- Cite
- Demo Notebooks for Dermatology Tasks
- Acknowledgements
git clone --recurse-submodules https://github.com/sfu-mial/DermSynth3D.git
cd DermSynth3D
conda env create -f dermsynth3d.yml
conda activate dermsynth3d
# Build the container in the root dir
docker build -t dermsynth3d --build-arg USER=$USER --build-arg UID=$(id -u) --build-arg GID=$(id -g) -f Dockerfile .
# Run the container in interactive mode for using DermSynth3D
# See 3. How to use DermSynth3D
docker run --gpus all --user=root --runtime=nvidia -it --rm -v /path/to/downloaded/data:/data dermsynth3d
We provide some pre-built docker images, which can be be used as well to:
# pull this latest docker image with the latest code
# you need to prepare the data following the instructions below
docker pull sinashish/dermsynth3d:latest
# pull this image for trying out the code with demo data i.e. lesions and meshes
docker pull sinashish/dermsynth3d:demo_w_code
# Run the container in interactive GPU mode for generating data and training models
# mount the data directory to the container
docker run --gpus all -it --user=root --runtime=nvidia --rm -v /path/to/downloaded/data:/data dermsynth3d:<tag name>
NOTE: The code has been tested on Ubuntu 20.04 with CUDA 11.1, python 3.8, pytorch 1.10.0, and pytorch3d 0.7.2, and we don't know if it will work on CPU.
If you face any issues installing pytorch3d, please refer to their installation guide or this issue link.
Follow the instructions below to download the datasets for generating the synthetic data and training models for various tasks.
All the datasets should be downloaded and placed in the data
directory.
DermSynth3D/
┣ ... # other source code
┣ data/ # directory to store the data
┃ ┣ 3dbodytex-1.1-highres # data for 3DBodyTex.v1 3d models and texture maps
┃ ┣ fitzpatrick17k/
┃ ┃ ┣ data/ # Fitzpatrick17k images
┃ ┃ ┗ annotations/ # annotations for Fitzpatrick17k lesions
┃ ┣ ph2/
┃ ┃ ┣ images/ # PH2 images
┃ ┃ ┗ labels/ # PH2 annotations
┃ ┣ dermofit/ # Dermofit dataset
┃ ┃ ┣ images/ # Dermofit images
┃ ┃ ┗ targets/ # Dermofit annotations
┃ ┣ FUSeg/
┃ ┃ ┣ train/ # training set with images/labels for FUSeg
┃ ┃ ┣ validation/ # val set with images/labels for FUSeg
┃ ┃ ┗ test/ # test set with images/labels for FUSeg
┃ ┣ Pratheepan_Dataset/
┃ ┃ ┣ FacePhoto/ # images from Pratheepan dataset
┃ ┃ ┗ GroundT_FacePhoto/ # annotations
┃ ┣ lesions/ # keep the non-skin masks for 3DBodyTex.v1 meshes here
┃ ┣ annotations/ # segmentation masks for Annotated Fitzpatrick17k lesions
┃ ┣ bodytex_anatomy_labels/ # per-vertex labels for anatomy of 3DBodyTex.v1 meshes
┃ ┣ background/ # keep the background scenes for rendering here
┃ ┗ synth_data/ # the generated synthetic data will be stored here
┣ train/ # training set with images/labels for training on synthetic data
┣ <val/test>/ # val and test set with images/labels for training on synthetic data
The datasets used in this work can be broadly categorized into data required for blending and data necessary for evaluation.
Data for Blending
Download 3DBodyTex.v1 meshes
A few examples of raw 3D scans in sports-clothing from the 3DBodyTex.v1 dataset showing a wide range of body shapes, pose, skin-tone, and gender.
The
3DBodyTex.v1
dataset can be downloaded from here.
3DBodyTex.v1
contains the meshes and texture images used in this work and can be downloaded from the external site linked above (after accepting a license agreement).NOTE: These textured meshes are needed to run the code to generate the data.
We provide the non-skin texture maps annotations for 2 meshes:
006-f-run
and221-m-u
. Hence, to generate the data, make sure to get the.obj
files for these two meshes and place them indata/3dbodytex-1.1-highres
before excecutingscripts/gen_data.py
.After accepting the licence, download and unzip the data in
./data/
.
Download the 3DBodyTex.v1 annotations
Non-skin texture maps Anatomy labels We provide the non-skin texture map (
$T_{nonskin}$ ) annotations for 215 meshes from the3DBodyTex.v1
dataset here.We provide the per-vertex labels for anatomical parts of the 3DBodyTex.v1 meshes obtained by fitting SCAPE template body model here.
A sample texture image showing the annotations for non-skin regions.
A few examples of the scans showing the 7 anatomy labels.
The folders are organised with the same IDs as the meshes in
3DBodyTex.v1
dataset.NOTE: To download the the 3DBodyTex.v1 annotations with the links referred above, you would need to request access to the 3DBodyTex.DermSynth dataset by following the instructions on this link.
Download the Fitzpatrick17k dataset
An illustration showing lesions from the Fitzpatrick17k dataset in the top row, and it's corresponding manually segmented lesion annotation in the bottom row.We used the skin conditions from Fitzpatrick17k. See their instructions to get access to the Fitzpatrick17k images. We provide the raw images for the Fitzpatrick17k dataset here.
After downloading the dataset, unzip the dataset:
unzip fitzpatrick17k.zip -d data/fitzpatrick17k/We provide a few samples of the densely annotated lesion masks from the Fitzpatrick17k dataset within this repository under the
data
directory.More of such annotations can be downloaded from here.
Download the Background Scenes
A few examples of the background scenes used for rendering the synthetic data.
Although you can use any scenes as background for generating the random views of the lesioned-meshes, we used SceneNet RGB-D for the background IndoorScenes. Specifically, we used this split, and sampled 3000 images from it.
For convenience, the background scenes we used to generate the ssynthetic dataset can be downloaded from here.
A few examples from the FUSeg dataset showing the images in the top row and, it's corresponding segmentation mask in the bottom row.
The Foot Ulcer Segmentation Challenge (FUSeg) dataset is available to download from their official repository. Download and unpack the dataset at
data/FUSeg/
, maintaining the Folder Structure shown above.For simplicity, we mirror the FUSeg dataset here.
A few examples from the Pratheepan dataset showing the images and it's corresponding segmentation mask, in the top and bottom row respectively.
The Pratheepan dataset is available to download from their official website. The images and the corresponding ground truth masks are available in a ZIP file hosted on Google Drive. Download and unpack the dataset at
data/Pratheepan_Dataset/
.
A few examples from the PH2 dataset showing a lesion and it's corresponding segmentation mask, in the top and bottom row respectively.
The PH2 dataset can be downloaded from the official ADDI Project website. Download and unpack the dataset at
data/ph2/
, maintaining the Folder Structure shown below.
An illustration of a few samples from the DermoFit dataset showing the skin lesions and it's corresponding binary mask, in the top and bottom row respectively.
The DermoFit dataset is available through a paid perpetual academic license from the University of Edinburgh. Please access the dataset following the instructions for the DermoFit Image Library and unpack it at
data/dermofit/
, maintaining the Folder Structure shown above.
Generated synthetic images of multiple subjects across a range of skin tones in various skin conditions, background scene, lighting, and viewpoints.
For convenience, we provide the generated synthetic data we used in this work for various downstream tasks here.
If you want to train your models on a different split of the synthetic data, you can download a dataset generated by blending lesions on 26 3DBodyTex scans from here. To prepare the synthetic dataset for training. Sample the
images
, andtargets
from the path where you saved this dataset and then organise them intotrain/val
.NOTE: To download the synthetic 3DBodyTex.DermSynth dataset referred in the links above, you would need to request access by following the instructions on this link.
Alternatively, you can use the provided script
scripts/prep_data.py
to create it.Even better, you can generate your own dataset, by following the instructions here.
A few examples of annotated data synthesized using DermSynth3D. The rows from top to bottom show respectively: the rendered images with blended skin conditions, bounding boxes around the lesions, GT semantic segmentation masks, grouped anatomical labels, and the monocular depth maps produced by the renderer.
Before running any code to synthesize a densely annotated data as shown above, make sure that you have downloaded the data necessary for blending as mentioned in datasets and folder structure is as described above.
If your folder structure is different from ours, then update the paths, such as bodytex_dir
, annot_dir
, etc., accordingly in configs/blend.yaml
.
Now, to generate the synthetic data with the default parameters, simply run the following command to generate 2000 views for a specified mesh:
python -u scripts/gen_data.py
To change the blending or synthesis parameters only, run using:
# Use python scripts/gen_data.py -h for full list of arguments
python -u scripts/gen_data.py --lr <learning rate> \
-m <mesh_name> \
-s <path to save the views> \
-ps <skin threshold> \
-i <blending iterations> \
-v <number of views> \
-n <number of lesions per mesh>
Feel free to play around with other random
parameter in configs/blend.yaml
to control lighting, material and view points.
We use Pytorch3D as our choice of differential renderer to generate synthetic data. However, Pytorch3D is not a Physically Based Renderer (PBR) and hence, the renderings are not photorealistic or may not look photorealistic. To achieve photorealistic renderings, we use Unity to post-process the renderings obtained from Pytorch3D.
A visual comparison of the renderings obtained from Pytorch3D and Unity (Point Lights and Mixed Lighting).
NOTE: This is an optional step. If you are not interested in creating photorealistic renderings, you can skip this step and use the renderings obtained from Pytorch3D directly. We didn't observe a significant difference in the performance of the models trained on the renderings obtained from Pytorch3D and Unity.
Follow the detailed instructions outlined here to create photorealistic renderings using Unity. Alternatively, download the renders that we created using Unity here.
After creating the syntheic dataset in the previous step, it is now the time to evaluate the utility of the dataset on some real-world tasks.
Before, you start with any experiments, ideally you would want to organize the generated data into train/val/test
sets.
We provide a utility script to do the same:
python scripts/prep_data.py
You can look at scripts/prep_data.py
for more details.
If you find this work useful or use any part of the code in this repo, please cite our paper:
@article{sinha2024dermsynth3d,
title={DermSynth3D: Synthesis of in-the-wild annotated dermatology images},
author={Sinha, Ashish and Kawahara, Jeremy and Pakzad, Arezou and Abhishek, Kumar and Ruthven, Matthieu and Ghorbel, Enjie and Kacem, Anis and Aouada, Djamila and Hamarneh, Ghassan},
journal={Medical Image Analysis},
pages={103145},
year={2024},
publisher={Elsevier}
}
Qualitative results for (a) foot ulcer bounding box detection on FUSeg dataset, (b) multi-class segmentation (lesions,skin, and background) and in-the-wild body part prediction, (c) skin segmentation and body part prediction on Pratheepan dataset, and (d) multi-class segmentation (lesions, skin, and background) on dermoscopy images from PH2 dataset.
Note: Update the paths to relevant datasets in configs/train_mix.yaml
.
To train a lesion segmentation model with default parameters, on a combination of Synthetic and Real Data, simply run:
python -u scripts/train_mix_seg.py
Play around with the following parameters for a combinatorial mix of datasets.
real_ratio: 0.5 # fraction of real images to be used from real dataset
real_batch_ratio: 0.5 # fraction of real samples in each batch
pretrain: True # use pretrained DeepLabV3 weights
mode: 1.0 # Fraction of the number of synthetic images to be used for training
You can also look at this notebook for a quick overview for training lesion segmention model.
For inference of pre-trained models/checkpoints, look at this notebook.
We also train a multi-task model for predicting lesion, anatomy and depth, and evaluate it on multiple datasets.
For a quick overview of multi-task prediction task, checkout this notebook.
For performing inference on your trained models for this task. First update the paths in configs/multitask.yaml
. Then run:
python -u scripts/infer_multi_task.py
For a quick overview for training lesion detection models, please have a look at this notebook.
For doing a quick inference using the pre-trained detection models/ checkpoints, have a look at this notebook.
We are thankful to the authors of Skin3D for making their code and data public for the task of lesion detection on 3DBodyTex.v1 dataset.