Seeing the patient’s emotional and physical condition is crucial when designing patient-computer interaction systems. However, gathering large datasets in sensitive situations like filming a person in pain can be challenging and ethically questionable.
The primary aim of this study is to assess the possibility of using synthetic data as an alternative data source to create models capable of effectively recognizing patient pain. Initially, a synthetic dataset was generated as the foundation for model development. To maintain the relevance of the synthetically generated dataset’s diversity, a 3D model of real people was created by extracting facial landmarks from a source dataset and generating 3D meshes using EMOCA (Emotion Driven Monocular Face Capture and Animation). Meanwhile, facial textures were sourced from publicly available datasets like CelebV-HQ and FFHQ-UV.
An efficient pipeline was created for human mesh and texture generation, resulting in a dataset of 8,600 synthetic human heads generated in approximately 2 hours per perspective and texture. The datasets encompass varying facial textures and perspectives and total over 300 GB. This approach enhances gender and ethnic diversity while introducing perspectives from previously unseen viewpoints.
Combining the 3D models with the extracted textures created new characters with varying facial textures but identical facial expressions. The study aims to bridge the gap between synthetic data and real-world medical contexts using domain adaptation methods, like Domain Mapping. This approach eliminates the need for human participants and addresses ethical issues associated with traditional data collection methods.
Different combinations of datasets, encompassing various textures and perspectives, were utilized to train models and assess the feasibility of synthetic data for domain adaptation (Domain Mapping) with real human data as input video.
However, incorporating synthetic and real data leads to improved pain recognition capabilities. This combined approach can leverage the strengths of both real and synthetic datasets, resulting in a more robust and effective model for pain recognition.
To generate the meshes, we use the EMOCA repository. For creating the textures, we utilize FFHQ-UV. The video rendering is done using Blender.
This code employs distinct conda environments for the respective repositories, FFHQ-UV and EMOCA, for their specific generation tasks.
- Install conda
- Run
pull_submodules.sh
for the submodules
Follow the instructions in the EMOCA repository
Follow the instructions in the FFHQ-UV repository
(Due to the use of a Slurm Cluster management system, a container system was used for running the blender in a parallelized way)
For rendering a sequence of meshes the Blender plugin Stop-motion-OBJ is used
- Download the latest version which is compatible with Blender LTS 3.6
- For easier usage download Blender LTS 3.6 on the local device and install the plugin
- After successfully installing the plugin and Blender, save the config folder from Blender for later mounting it in the container
(for the location of the folder see here) - create inside a folder these folders with the following files/folders:
blender
: the .blend files with the current configuration of camera perspective, lightning, etc., the Stop-motion-OBJ folder and the config folderall_mesh
: for the mesh filesrender
: the render scripts are insideffhq_textures
: the texture filesvideos_mesh
: output folder where the rendered videos are saved
Change the mounted folder(-structure) in the generating_threading.py
between line 57 and 64 for other folder structure
Install the conda environment_model.yml
The scripts are adjusted for the Slurm cluster
Use and/or adjust the create_emoca_mesh.sh
script
- Create Texture:
runcreate_texture.sh
with the arguments--input_dir
and--output_dir
- Apply UV mapping with the texture and the meshes:
runapply_texture_with_mesh.sh
and adjust the arguments--input_dir
and--thread_num
(this is an in-place operation)
(For render videos a .blend file is needed where the camera, the lightning, etc. is set)
Run the start_render_mesh.sh
and enter the parameter
Adjust the parameter in train_model_binary.sh
or train_model_multi.sh
and run the code
More adjustable parameter is in the python code model_slowfast.py
This project was created as part of my bachelor thesis in cooperation with the
University of Stuttgart and the Institute for Artificial Intelligence in Medicine Essen (IKIM Essen)
For organizational questions, please contact Jun.Prof. Dr.-Ing. Alina Roitberg or Dr.-Ing. Constantin Seibold.
These implementations are based on the FFHQ-UV and EMOCA repositories
Big thanks to Jun.Prof. Dr.-Ing. Alina Roitberg and Dr.-Ing. Constantin Seibold for their support in the project.
If you use this work, please cite:
@misc{nasimzada2024syntheticdatagenerationimproved,
title={Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints},
author={Jonas Nasimzada and Jens Kleesiek and Ken Herrmann and Alina Roitberg and Constantin Seibold},
year={2024},
eprint={2409.16382},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.16382},
}